Before we begin, start a new file with File \(\rightarrow\) New File \(\rightarrow\) R Script. As you work through this sheet in the console in R, also add (copy/paste) your commands that work into this new file. At the end, save it, and run to execute all of your commands at once.
gapminder that uses a small snippet of this data for exploratory analysis. Install and load the package gapminder. Type ?gapminder and hit enter to see a description of the data.gapminder to see what we’re dealing with.structure of the gapminder data.## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
summary statistics of all variables.## country continent year lifeExp
## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
## Algeria : 12 Asia :396 Median :1980 Median :60.71
## Angola : 12 Europe :360 Mean :1980 Mean :59.47
## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
## Australia : 12 Max. :2007 Max. :82.60
## (Other) :1632
## pop gdpPercap
## Min. :6.001e+04 Min. : 241.2
## 1st Qu.:2.794e+06 1st Qu.: 1202.1
## Median :7.024e+06 Median : 3531.8
## Mean :2.960e+07 Mean : 7215.3
## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5
## Max. :1.319e+09 Max. :113523.1
##
ggplot2histogram to visualize the distribution of a variable. Make a histogram of gdpPercap. Your only aesthetic here is to map gdpPercap to x.## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
aesthetic that maps continent to fill.2## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
global aesthetic of mapping continent to color. If we want just one regression line, we need to instead move the color = continent inside the aes of geom_point. This will only map continent to color for points, not for anything else.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent))+
geom_smooth()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
"black". Try first by putting this inside an aes() in your geom_smooth, and try a second time by just putting it inside geom_smooth without an aes(). What’s the difference, and why?ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(aes(color = "black"))## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
# putting it inside aesthetics tries to map color to something
# in the da ta called "black", since R can't find "black",
# it will produce some random color
ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(color = "black")## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
faceting. Add +facet_wrap(~continent) to create subplots by continent.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(color = "black")+
facet_wrap(~continent)## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
facet layer. The scale is quite annoying for the x-axis, a lot of points are clustered on the lower level. Let’s try changing the scale by adding a layer: +scale_x_log10().ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(color="black")+
scale_x_log10()## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
+labs(). Inside labs, make proper axes titles for x, y, and a title to the plot. If you want to change the name of the legends (continent color), add one for color and size.ggplot(data = gapminder,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth(color="black")+
scale_x_log10()+
labs(x = "GDP per Capita",
y = "Life Expectancy",
color = "Continent",
size = "Population")## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
gapminder dataframe and subset it to only look at continent=="Americas"). Assign this to a new dataframe object (call it something like america.) Now, use this as your data, and redo the graph from question 17. (You might want to take a look at your new dataframe to make sure it worked first!)ggplot(data = america,
aes(x = gdpPercap,
y = lifeExp))+
geom_point(aes(color = continent,
size = pop))+
geom_smooth()## `geom_smooth()` using method = 'loess' and formula 'y ~ x'